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Abstract 

Multidrug-resistant Enterobacteriaceae are emerging as a serious infectious disease cliallenge. These strains can accumulate 
many antibiotic resistance genes thougli liorizontal transfer of genetic elements, those for p-lactamases being of particular 
concern. Some p-lactamases are active on a broad spectrum of p-lactams including the last-resort carbapenems. The gene 
for the broad-spectrum and carbapenem-active metallo-p-lactamase NDM-l is rapidly spreading. We present the complete 
genome of Klebsiella pneumoniae ATCC BAA-2146, the first U.S. isolate found to encode NDM-l, and describe its repertoire 
of antibiotic-resistance genes and mutations, including genes for eight p-lactamases and 15 additional antibiotic-resistance 
enzymes. To elucidate the evolution of this rich repertoire, the mobile elements of the genome were characterized, 
including four plasmids with varying degrees of conservation and mosaicism and eleven chromosomal genomic islands. 
One island was identified by a novel phylogenomic approach, that further indicated the cps-lps polysaccharide synthesis 
locus, where operon translocation and fusion was noted. Unique plasmid segments and mosaic junctions were identified. 
Plasmid-borne Wqctx-m-is was transposed recently to the chromosome by \SEcpl. None of the eleven full copies of IS26, the 
most frequent IS element in the genome, had the expected 8-bp direct repeat of the integration target sequence, 
suggesting that each copy underwent homologous recombination subsequent to its last transposition event. Comparative 
analysis likewise indicates IS26 as a frequent recombinational junction between plasmid ancestors, and also indicates a 
resolvase site. In one novel use of high-throughput sequencing, homologously recombinant subpopulations of the bacterial 
culture were detected. In a second novel use, circular transposition intermediates were detected for the novel insertion 
sequence \SKpn2l of the \SNCY family, suggesting that it uses the two-step transposition mechanism of IS3. Robust 
genome-based phylogeny showed that a unified Klebsiella cluster contains Enterobacter aerogenes and Raoultella, 
suggesting the latter genus should be abandoned. 
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Introduction 

Carbapenems are one of few antimicrobials that have been 
efFective against multidrug-resistant bacteria, but their utility is 
threatened by the emergence of carbapenem-resistant Enterobacte- 
riaceae (CRE). Klebsiella pneumoniae is the most common CRE species 
in the United States, typically encountered as a hospital-acquired 
infection with high morbidity and mortality, and resistant to nearly 
all available antibiotics [1^]. Enzymes that inactivate carbape- 
nems are a major mechanism of resistance. The serine P-lactamase 
KPC, known since 2001, has become the most common 
carbapenemase in the U.S. and other countries [1]. A more 
recent concern is the carbapenem-active metallo- P-lactamase 
NDM-l, first identified in a K. pneumoniae isolate from 2008 [5]. 
Alarmingly, A/qndm-i is often found on large conjugative plasmids 
along with additional antibiotic resistance determinants [6]. In 
some settings the gene region can form tandem repeats, elevating 
copy number [7]. The recent spread of A/qndm-i both among 



dilTerent species and across a large geographic area has been 
remarkable and well documented [5-11]. 

Non-carbapenemase mechanisms of carbapenem resistance are 
also known. These include increasing efflux pump activity [10] 
and altering the profile of outer membrane porins that control 
access of carbapenems to the cell wall [12,13]. 

K pneumoniae strain ATCC BAA-2146 (Kpn2146) was the first 
U.S isolate found to encode NDM-l together with a wide variety 
of additional antibiotic resistance determinants [14]. Susceptibility 
testing performed at ATCC found Kpii2146 to be resistant to 
every one of the 34 antimicrobial and antimicrobial/inhibitor 
combinations tested. While Kpn2146 resistance genes have been 
analyzed by both microarray [15] and (incomplete) genome 
sequencing [16,17], neither approach fuUy elucidated the complex 
Kpn2146 antibiotic resistance gene repertoire. For example some 
Kpn2146 antibiotic resistance genes were unrecognized in the 
previous work, and duphcated genes were counted only once by 
microarray and on one contig in the incomplete genome. Even 
when an incomplete genome does deliver the complete gene list. 
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the question of how a patiiogen accumulates such large collections 
of resistance genes requires the contextual information that comes 
from completing the genome. The complete genome is required to 
reveal gene duplication events, to determine plasmid vs. chromo- 
somal gene location, and to apply phylogenomic methods to 
understand the evolution of the genome. In this study, we present 
the completed Kpn2146 genome, identifying four plasmids, and 
enabling a detailed survey of its antibiotic-resistance determinants 
that fully explains its resistance profile. These determinants include 
23 primarily plasmid-borne genes encoding antibiotic-resistance 
enzymes, eight of which are P-lactamase genes. It is crucial to 
understand how such richly endowed pathogens arise, which 
requires analysis of the mobile fraction of the genome. Accord- 
ingly, we surveyed genomic islands in the chromosome, mosaicism 
in the plasmids, and transposable elements throughout the 
genome. 

Materials and Methods 

DNA preparation and sequencing 

Klebsiella pneumoniae KYGC BAA-2146 (Kpn2146) was isolated in 
2010 from the urine of a U.S. hospital patient who had recently 
received medical care in India [14]. Genomic DNA was obtained 
from American Type Culture Collection (ATCC), and re- 
suspended in water. A previously described lUumina paired-end 
genomic sequence dataset from a single MiSeq run, after quality 
and primer sequence trimming, consisted of 3,023,757 read pairs, 
with reads averaging 88.3 bp [17]. A Pacific Biosciences sequence 
dataset (PacBio) was generated from 2 Hg genomic DNA at the 
Yale Genome Sequencing Center, which performed the 5 kb 
template preparation and sequenced the library on two SMRT 
cells, yielding 88,073 direct reads and 1744 circular consensus 
sequences (size distribution: mean 2408; median 1948; range 50- 
18951; N50 3254 bp). 

Genome assembly 

As detailed in File SI, the above MiSeq and PacBio datasets 
were sufficient for unambiguously assembling the complex genome 
with no need for additional PCR-based finishing. Novel software 
available at http://bioinformatics.sandia.gov/software/index. 
html was useful for visualizing MiSeq coverage and assembly 
branch points in the more challenging regions (Fig. SI in File SI). 

Annotation 

Protein-coding genes were initially identified and annotated 
using RAST [18], and RNA genes were annotated with careful 
attention to tRNA, tmRNA and rRNA genes; Rfam/Infernal [19] 
found 118 additional RNA genes and motifs that helped identify 
certain regulatory genes and sites, mobile elements, plasmid 
replication origins, and toxin/antitoxin systems. The Antimicro- 
bial Resistance Database (ARDB) [20] was used to annotate 
antimicrobial resistance genes among the initially-called genes, 
testing that hits did not have better matches to other gene families; 
the more recentiy updated ResFinder [21] added only A/andm-i to 
this list of resistance genes. Explaining the Kpn2146 antibiotic 
resistance profile required the identification of additional genes not 
called by RAST. ISs were annotated using ISFinder [22]. Intact 
integrons were named according to INTEGRALL [23]. The 
chromosomal origin of rephcation oriC was identified according to 
[24] and PCR tests [25,26] were adapted for in silico plasmid 
replicon-typing. Observations on a high-copy group II intron, 
insertion sequences, and the lack of a CRISPR system are 
presented in File SI. 



Phylogenetic analysis 

The Kpn2146 genome was used for phylogenetic analysis, along 
with the 182 other Klebsiella reference genomes that were available 
at NCBI on December 20, 2013, and with fi\ c- additional genomes 
{Enterobacter cloacae SCFl, Tokenella regensburgei ATCC 43003, 
Raoultelk omithinolytica B6 and two Enterobacter aerogenes genomes) 
included because they were originally placed in Klebsiella, or 
because a phylogenetic tree at PATRIC [27] showed that they are 
the closest available outgroup or fall within the Klebsiella clade. 
Multilocus sequence typing (MLST) was performed using K. 
pneumoniae data from http://www.pasteur.fr/mlst. Preliminary 
results showed that the 84 genomes of sequence type (ST) 258 
formed a large tight clade together with the single ST512 genome; 
the five most divergent members of this clade were retained while 
the other 80 genomes of this clade were excluded from further 
analysis. The 108 remaining genomes were aligned into 234,232 
DNA sequence blocks using default Mugsy vl.2.3 [28]. Blocks 
representing all ingroup genomes were selected and processed 
using Gblocks v0.91b [29] with the b5 = h option to remove 
ambiguously ahgned regions, leaving 3476 blocks with a total of 
2,118,733 aligned positions averaging 99.3% occupancy, which 
were concatenated into a supermatrix. A maximum likelihood tree 
was produced with RAxML v7.2.8 [30] using the GTRGAJvIMA 
substitution model. Node support values were from a bootstrap set 
of 150 trees produced similarly, using the fast (-x) bootstrapping 
function and autoFC bootstopping. 

Genomic islands 

Three methods were used to find chromosomal genomic islands, 
i) Islander identified aft sites for islands integrated into a tRNA/ 
tmlWA gene [31]. ii) PHAST identified regions enriched for 
phage genes [32]. We also developed iii) a novel phylogenomic 
method termed Learned Phyloblocks (http://bioinformatics. 
sandia.gov/software/index.html), in which the genome is divided 
into regions of shared evolutionary history termed "phyloblocks", 
and those phyloblocks that are "learned", on the basis of their 
enrichment among the training set of Islander and PHAST 
islands, are used to indicate additional islands. The chromosomes 
of Kpn2 1 46 and the 11 other complete reference Enterobacter 
aerogenes and Klebsiella genomes were aligned using mugsy. This 
alignment determined the "phylotype" for each position on the 
Kpn2146 chromosome, i.e., the presence/absence pattern of the 
nucleotide among the reference genomes. This partitioned the 
Kpn2146 chromosome into phyloblock intervals defined as regions 
of uniform phylotype. Nonbiquitous phylotypes (those in which the 
sequence is not present in all 1 1 reference genomes) account for 
much (47.5%) of the Kpn2146 chromosome. This suggests that 
gene flux is high in Klebsiella, and not entirely explained by 
integrative genomic islands. We reasoned that some nonubiquitous 
phylotypes might be more indicative than others of horizontally 
transferred islands, if there are particularly common "highways" 
of island transfer among Klebsiella strains, as have been found in 
broader studies of horizontal gene transfer [33]. Phylotypes were 
ranked by the fraction of their nucleotides in the Islander and 
PHAST training islands. Phylotypes whose occurrence in training 
islands was >25% were termed "learned phyloblocks", and 
accounted for 7.6"/o of tlu- chromosome. 

Phylotypes were analyzed with Mowgli [34], parsimoniously 
counting gain/loss events required to reconcile our robust genome 
tree (Fig. 1) with its subtree of only the phylotype taxa. This 
allowed us to classify nonubiquitous phylotypes as either simple 
(explainable by a single gain/loss event), or complex (requiring 
multiple gains/losses). The complex class was significantiy 
overrepresented among the learned phylotypes (36 of 38) relative 
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Figure 1 . Klebsiella phylogeny. Tree for 1 08 genomes based on a 2.93-Mbp alignment, rooted at the midpoint of the outgroup (Ecl/Yre) branch. 
Nodes with <30% bootstrap support were combined forming the multifurcated dashed line; otherwise support values are shown only when <100%. 
Brackets: Kpn multilocus sequence type (ST). Inset: enlargement of the "core Kpn" phylogeny. Kpn2146 falls in a clade containing fellow ST1 1 strains 
Kpn JIVI45 and Kpn HS11286 and a tight clade (circled) of ST258 and ST51 2 strains. The ST258/ST512 clade is heavily sequenced, and represented here 
with only five of its most diverse members. Bold: complete genomes used for phyloblocks analysis. Species name abbreviations: Kpn, K. pneumoniae; 
Ksp, K. sp.; Kpl, K. cf. planticola; Kox, K. oxytoca; Kva, K. variicola; Eae, Enterobacter aerogenes; Eel, £ cloacae; Ror, Raoultella ornithinolytica; Yre, Yol<anella 
regensburgei. 

doi:1 0.1 371 /journal.pone.0099209.g001 
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to the remaining phylotypes (183 of 246) (one-sided x test of 
proportions: P<0.005). Only two learned phylotypes were simple: 
Kpn2146-only and Kpn2146/KpnHSl 1286-only. 



PCR-based analysis of \SKpn21 

While abundant sequence data mapped one lSKpn21 copy to 
the chromosome and a second to pKpn2146c, less abundant 
sequence data suggested additional copies either in tandem repeat 
form or as free circles. PGR tests to distinguish these possibilities 
first re-examined each genomic locus. The chromosomal copy was 
amplified using primers C/ (CGGTC ATAGT GTTGA 
TGTGGG) and Gr (CATGT CTATT TGGTG AGAGA 
CGG), while the plasmid copy was amplified using P/~(GCTTC 
CATGA CTGGT TGGTG) and Ft (GATGC CAAGG GGGTA 
AAGTTG). Cross-copy PGRs {i.e., P/ZCr and C/ZPi) tested for 
artifacts. Other primers tested for circular lSKpn21: IS/~(GGGGT 
TACAG GGCAT TTG) and ISr (GGTGT TTGAC GAGAG 
GATCC TG). PGR employed FaUSafe enzyme mix in buffer E 
(Epicentre) and scheduled 2 min at 95°G, 25 cycles (15 s at 95°C, 
30 s at 55°C, 3 min at 68°G), and 7 min at 68°C. Products were 
run on 1.2% agarose E-gels (Life Technologies). 
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Plasmid mosaicism 

The four plasmid sequences were queried against the July 29, 
2013 nt database using BLASTN in default mode {i.e., task 
"megablast"), hitting 899 complete natural plasmids. Each query 
and subject was self-concatenated (to avoid circular origin issues), 
and BLASTN was repeated, identifying regions unique to each 
plasmid. To define unique mosaic junctions, each query hit 
boundary was tested for other hits spanning the boundary (beyond 
10-bp tolerance windows). 
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Accession 

Raw MiSeq and PacBio reads were deposited at SRA 
(accessions SRR931757 and SRRl 185120, respectively). Genomic 
sequences have GenBank accessions CP006659-GP006663 and 
can also be browsed at http:/ /bioinformatics.sandia.gov/klebs/. 



Results and Discussion 
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Genome assembly using combined MiSeq and PacBio 
reads 

We sequenced the genome of Kleksiella pneumoniae strain ATGG 
BAA-2146 (Kpn2146), the first U.S. isolate found to encode the 
NDM-1 metallo-|3-lactamase. Assembly with an lUumina dataset 
alone was limited by poor coverage in GC-rich regions and by 
ambiguity at long repeats (Table SI in File SI). However, adding a 
dataset of long but low accuracy PacBio reads, together with 
custom software for visualizing lUumina reads (Fig. SI in File SI), 
allowed unambiguous assembly into five circular replicons: a 
chromosome and four plasmids (Table 1). 

Antibiotic resistance determinants 

ATGG has reported resistance of Kpn2146 to each of the 34 
antimicrobial and antimicrobial/inhibitor combinations tested, 
including tests for 23 P-lactams (penicillins with or without 
inhibitors, cephalosporins, carbapenems and aztreonam), five 
fluoroquinolones, three aminoglycosides (tobramycin, amikacin 
and gentamicin), and four others (tetracycline, tigecycline, nitro- 
furantoin, and trimethoprim/sulfamethoxazole); see http://www. 
atcc.org/ ~ /media/BA6C8F7C 7G4G4649B2AEF50 1 E5 1 D76B8. 
ashx for the fuU list. Kpn2146 resistance genes have also been 
surveyed with a combination of microarray and amplicon 
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sequencing [15]. The genome sequence fully rationalized the 
resistance profile, with ample evidence for one or more mechanisms 
explaining each observed antibiotic-resistance, and supported the 

gene survey. It further identified previously untested genes (like 
qnrB9), allelic multiplicit)' (aac(6')-Ih, sull, A/asHV-ii and Wactx-m- 
15) and location (plasmid vs. chromosome), as well as housekeeping 
gene mutations (Table 2). These gene duplications can increase 
resistance; duplication of A/osHV-ii has been shown to increase 
amoxicillin-resistance 16-fold [35]. 

Eight genes for |3-lactamases representing all four Ambler 
classes were identified; together these explain the broad fi-lactam 
and inhibitor resistance of Kpn2146. We further identified specific 
resistance genes for tetracycline, trimethoprim, sulfonamides, 
macrolides, and multiple aminoglycoside resistance genes [.36], 
including three aac(6' )-Ib variants, one shown to confer additional 
low-level resistance to quinolones [37] in addition to the usual 
spectrum of aminoglycosides inactivated by AAC(6')-Ib which 
includes tobramycin, amikacin, and gentamicin Cla and C2. 

The complete genome also reveals certain housekeeping gene 
mutations that are related to drug resistances. Its GyrA Ser83>Ile 
and ParC Ser80>Ile combination has previously been found in K. 
pneumoniae isolates with high-level resistance to several fluoroquin- 
olones [38]. QnrB9 of Kpn2146, like other plasmid-encoded 
quinolone resistance enzymes, confers low-level resistance to 
fluoroquinolones, and may facilitate selection of mutations in g)!rA 
and parC associated with higli-l(-v(;l resistance [39-42]. A 
frameshift mutation in the nitroreductase gene nfsA is likely 
responsible for the observed resistance to nitrofurantoin [43]. 

The above observations explain the entire known resistance 
profile, except the tigecycline resistance. Mechanisms previously 
suggest(;d for tigecycline resistance are mutations in the gene for 
the ribosomal protein SIO (Kpn2146 has the wild type allele) and 
mutations increasing the expression of the AcrAB/TolC efflux 
system [44,45] . One mutation class causing overexpression of this 
efflux system is inactivation of its repressor RamR; Kpn2146 has 
such a ramR disruption (insertion of ISKpnlS) that can thereby 
explain the observed tigecychne resistance. Additional efflux 
systems (Table 2), such as the macrolide-specific efflux system 
MacAB/TolC [45], may contribute to the intrinsic spectrum of 
resistance, especially if overexpressed. 

We also detected an early nonsense mutation that disrupts th(" 
porin gene ompK35, fitting with many ESBL-producing K. 
pneumoniae strains that lack OmpK35 [12]. We do not however 
observe the concomitant loss of OmpK36 that significandy 
decreases susceptibility for meropenem and several cephalosporin 
fS-lactams; ompK36 and ompK37 appear to be intact [46,47]. In a 
recently reported Kkhsiella carbapenem resistance mode, the marR 
regulator)' gene is inactivated and the jedS porin gene is active 
[13]; this mode is unlikely to pertain here since marR is intact and 
yedS is lacking in Kpn2146. 

Class 1 integrons and integron fragments 

One third of the antibiotic resistance enzyme genes listed in 
Table 2, including all three of the aac(6')-Ib alleles, are associated 
with five scattered class 1 integrons or integron fragments (Fig. S2 
in File SI). Four of these are on plasmids, often within 
recognizable fragments of transposons, and the fifth is within a 
genomic island on the chromosome. We discuss below a case of 
cassette swapping where comparative analysis suggests the swap 
may have been mediated by homologous recombination rather 
than class 1 integron integrase action. 



Plasmid overview 

Plasmid copy numbers were measured relative to the chromo- 
some from the MiSeq reads, taking unique 21-mers; extremely 
small pKpn2146a was high-copy, while pKpn2146b, pKpn2146c 
and the A/andm-i plasmid, pNDM-US, were large and low-copy 
(Table 1). The large plasmids carry most of the antibiotic 
resistance enzyme genes in the genome (Table 2). Some mobile 
genes with currently unknown function may eventually prove to be 
new virulence or resistance genes; hypothetical genes are enriched 
in the two largest plasmids relative to the total genome (Table S2 
in File SI). 

Conserved Wqndm 1 plasmid pNDM-US 

The pNDM-US plasmid carrying A/andm-i (Fig- 2) was replicon- 
typed as IncA/C; it bears the IncA/C rep gene and iteron region, 
and encodes a ParAB partitioning system. Moreover, it encodes 
tiie complete set of proteins (TraABCDEFGHIKLNUVW, TrhF, 
DsbC, s043, s063, 123, 234, and 345) for the F-type conjugation 
pilus/Type IV secretion system, of the MOBhi2 mobility class 
[47]. 

pNDM-US (140.8 kbp) is highly similar to numerous recentiy- 
sequenced plasmids, yet unique in bearing a copy of the relatively 
rare IS3000 between ter and krfA. Recent insertion of IS3000 is 

further supported I))" its 5-l)p direct repeat of target sequence (DR), 
the first clear measurement of its DR length, in agreement with its 
membership in the Tn3 family [48]. We d(;scril)e the rather few 
differences, each discemable as distinct DNA mobility events, 
between pNDM-US and its two closest known relatives: pNDM- 
KN gN157804: 162.7 kbp) [49] and pNDM102337 gF714412: 
166.0 kbp), which each in total share 137 kbp at >99.98% 
identity with pNDM-US. pNDM-KN has three large segments 
missing in pNDM-US: i) an lSEc23 insert, ii) a Tn 7/ restriction 
system segment, and iii) a 4-cassette integron in place of the single 
[aac(6' )-Ib) cassette integron. The second reference plasmid 
pNDM102337 has i) the same 1-cassette integron as pNDM-US, 
ii) the Tn7/restriction system segment of pNDM-KN and iii) bears 
a segment missing from both pNDM-US and pNDM-KN that 
carries additional resistance determinants and a fuU length 
l^Ahal25 [50]. 

The integron in pNDM-KN and pNDM102337 is in a 
fragment of Tn76'.96' that has 1S4321 inserted in its remaining 
IR. The presence of different gene cassettes in pNDM-KN (In578), 
pNDM-US (In46), and other Tnl696 variants might suggest 
recent integrase activity at this integron. However an alternative 
explanation for integron cassette swapping is by double homol- 
ogous recombination in the long cassette-flanking regions that are 
conserved in most integrons, namely, the upstream integrase gene 
(5'-CS, 1352 bp) and the downstream AqacE-suU-orf5 unit (3'-CS, 
1616 bp) [51]. This latter suggestion is supported by the presence 
of three of the very few point mutational differences between 
pNDM-US and pNDM-KN near the att sites in these two flanks. 
In the 136,910 bp shared between pNDM-US and pNDM-KN 
there are ten sites of small-scale indel or base-substitution; three of 
these are in the 5'-CS and 3'-CS, for an enrichment of (3/2968)/ 
(7/133942)= 19.3 fold. 

ISEcpl has transposed into pNDM-US, bringing its 2832-bp 
flanking segment bearing /i/flciMY-B) "U^d has been inserted 
intergenically into the transfer operon tra. The pNDM-US 
^^NDM-i region is found as in pNDM-KN and in many other 
Klebsiella plasmids; its interpretation as an immobile derivative of 
the mobile Tnl25 of Acinetobacter baumannii strains has been 
discussed [52,53]; here Tni25 is truncated at one end by lSKpnl4 
and within the 1SCR21 unit at the other end. 
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Table 2. Enzymes, efflux pumps, and mutations expected to confer resistance to antibiotics of clinical relevance^. 





Enzyme'' 


Gene location(s) 


Coordinates 


Resistance phenotype 


NDM-1 (class B) 


pNDM-USTn;2J 


122191-123003 


Penicillins, cephalosporins, carbapenems, inhibitor-resistant 


SHV-1 1 (class A)' 


1. pKpn2146b 


36313-37173 


Penicillins, some cephalosporins, inhibitor-sensitive 




2. Chromosome 


2612996-2613856 




aX-M-15 (class A) 


1. pKpn2146b ISEcp) 


47130-48005 


Penicillins, some cephalosporins, aztreonam. 




2. Chromosome \SEcp1 


5407530-5408405 


inhibitor-sensitive 


TEM-1 (class A) 


pKpn2146b Tn2 


50827-51687 


Penicillins, some cephalosporins, inhibitor-sensitive 


CMY-6 (class C) 


pNDM-US ISfcp; 


72203-73348 


Penicillins, some cephalosporins, inhibitor-resistant 


OXA-1 (class D) 


pKpn2146b Aln37 


38798-39673 


Penicillins, inhibitor-resistant 


AAC{3)-lle 


pKpn2146b 


41 1 16-41976 


Gentamicin, tobramycin, netilmicin, sisomicin 


AAC(6')-lb (43) 


pNDM-US In46 


1 151 14-1 15737 


Tobramycin, amikacin, netilmicin, sisomicin 


AAC(6'Hb (1) 


pKpn2146b AlnTnl331 


82745-83350 


Tobramycin, amikacin, netilmicin, sisomicin 


AAC(6')-lb-cr (29) 


pKpn2146b Aln37 


38113-38712 


Tobramycin, amikacin, netilmicin, sisomicin, quinolones (low- 
level) 


ANT(3")-la 


Kpn23SapB Inl27 


2297711-2298502 


Streptomycin, spectinomycin 


APH(3")-lb (StrA) 


pKpn2146b ISCR2 


53244-54047 


Streptomycin 


APH(6)-ld (StrB) 


pKpn2146b ISCR2 


52408-53238 


Streptomycin 


Sul2 


pKpn2146b ISCR2 


54108-54923 


Sulfonamides 


RmtC 


pNDM-US ISfcp; 


120100-120945 


Aminoglycosides (via rRNA modification) 


Sull 


1. Kpn23SapB Inl27 


2299007-2299846 


Sulfonamides 




2. pNDM-US In46 


116245-117084 




DfrA14 


pKpn2146b Inl91 


8281-8754 


Trimethoprim 


QnrB9 


pKpn2146b 


26074-26742 


Quinolones, fluoroquinolones 


Mph(A) 


pKpn2146c 


16503-17408 


Macrolides, Erythromycin 


FosA 


Chromosome 


667960-668379 


Fosfomycin 


Efflux pump 


Gene Location 




Probable substrate(s}'^ 


AcrAB-TolC 


Chromosome 


1249681-1254043 


Aminoglycosides, p-lactams, tigecycline, macrolides 


AcrEF-TolC 


Chromosome 


4936203-4940465 


Minor role 


EefABC 


Chromosome 


5354323-5329922 


Chloramphenicol, tetracyclines, ciprofloxacin 


MacAB-TolC 


Chromosome 


1857393-1860445 


Macrolides 


MdfA 


Chromosome 


1781588-1782820 


Aminoglycosides, fluoroquinolones, chloramphenicol 


MdtG,H,K,L,M,NOP 


Chromosome 


e 


Many possible substrates (MFS superfamlly pumps) 


OqxAB 


Chromosome 


4169609-4173960 


Chloramphenicol, fluoroquinolones, trimethoprim 


EmrAB 


Chromosome 


4218886-4221612 


Nalidixic acid, hydrophobic compounds 



TetA(A) pKpn21 46c Tn 772 / 1 91 68-20367 Tetracyclines 



Gene 



Mutation 



Resistance phenotype 



gyrA Gyrase 


Ser83TTC lleATC 


3763583-3766216 


Quinolone, fluoroquinolones 


parC Topo IV 


SerSOAGC lleATC 


4689294-4691552 


Quinolone, fluoroquinolones 



nfsA Nitroreductase 



Frameshift 



1826275-1826998 



Nitrofurantoin 



^Excluding the resistance enzyme for bleomycin, an antibiotic used clinically only as an antitumor agent. 
''Variant number from Table 1 of Ramirez et a!. [37] is used to distinguish the AAC(6')-lb variants. 
*^Two silent differences between the two copies. 

^Probable efflux substrates identified from literature sources including ARDB; the substrates list is not comprehensive and in many cases has been deduced from 

organisms other than K. pneumoniae. 

''Mdt genes are scattered over the chromosome. 

doi:1 0.1 371 /journal.pone.0099209.t002 



Mosaic plasmid pKpn2146c 

pKpn2146c (Fig. 3) was replicon-typed as both IncFIIA and 
IncFIB. It was typed to IncFIIA using the copA RNA gene and copB 
and rep protein genes, and to IncFIB through its IncFIB iteron 
region and rep gene. An iteron region IncD like that of the F 
plasmid was also identified. 



pKpn2 1 46c is a large mosaic plasmid, which shares much of its 
sequence with the A/qndm-i containing plasmid pKPX-1, including 
both the large copper/ arsenic resistance region and the resistance 
gene mph{A) region. pKpn2146c is also enriched for hypothetical 
genes (Table S2 in File SI). Three of the eleven ISi'ff copies in the 
Kpn2146 genome occur in this plasmid (Table S3 in File SI). 
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Figure 2. pNDM-US. Key, color coding of genes, mobile elements, and unique regions and juxtapositions, with additional colors for non-gene 
features. Inner ring, representative long matches to other plasmids. abR, antibiotic resistance. 
doi:1 0.1 371 /journal.pone.0099209.g002 



Directly adjacent to the fnph{A) and IS26' region is a Tnl721 [54] 
fragment bearing tlie tetA{A) resistance gene. This transposition 
junction is unique among plasmids in public databases. The other 
end of ATnl721 is truncated by an IS^ff insertion. 

Highly mosaic plasmid pKpn2146b 

pKpn2146b (Fig. 4) was replicon-typed as both IncFIA (iteron 
unit, oris and rep gene) and IncR. It has a largely intact IncR repeat 
region located 34.5 kbp apart from a locus with the rep, parA and 
parB genes and parS site. pKpn2146b additionally has a region of 
the iteron from the IncN plasmid R46 which is repeated 30.6 
times, but without the IncN rep gene, apparendy lost through ISi?^ 
insertion followed by homologous recombination. 

pKpn2146b is the richest of the plasmids in resistance 
determinants (12 determinants; Table 2), and the most highly 
mosaic, with the highest number (six) of \S26 copies. Comparison 
with other plasmids shows evidence for an illegitimate recombi- 
nation at the resolution site of the plasmid-encoded resolvase ResD 
(see Fig. 4 at coordinate 78900), where the IncR control region 
joins unusual sequence found elsewhere only in pK245 
(DQ449578). Comparison also shows a particular pattern that 
we call "IS-flank switch"; one example is marked as "HR" near 
coordinate 38000 on Fig. 4, where homology to one reference 
(plasmid pRA'IH7 1 2) begins precisely at one end of a long repeated 
region (ISi'6) and extends through the IS and well into one flank, 



while the same pattern occurs for the other flank with a second 
reference (plasmid pKDOl). We hypothesize that this IS-flank 
switch pattern resulted from homologous recombination between 
ISi'ff-containing parents as proposed previously [55]. This 
hypothesis of homologous recombination subsequent to two 
independent transposition events is supported by failure to find 
the 8-bp target sequence direct repeat (DR) expected for a recent 
transposition of IS26. In fact none of the six copies of ISi'ff in 
pKpn2146b, nor any of the other five copies elsewhere in the 
genome, contain the DR expected for recent insertion (Table S3 in 
File SI), suggesting that every \'Si26 copy in the genome has 
undergone homologous recombination more recently than trans- 
position. We find another IS-flank switch pattern ("HR" at the top 
of Fig. 4), that we suspect provides an explanation of how the IncN 
iterons lost their associated IncN rep gene. 

The A/flsHV-ii gene originated in situ in the R. pneumoniae 
chromosome, and has been transferred to plasmids at least twice, 
in both cases as a chromosomal fragment flanked by directiy 
repeated ISJPffcopies [56,57]. The pKpn2146b copy of i/asHV-ii is 
like the prototype in plasmid pKPN4 (CP000649), except that one 
of the 1S26 copies used to transmit this segment has been 
truncated by insertion of IS3000, which was then uniquely 
interrupted by lSEc22. 

pKpn2146b has much of the A/qtem-i -containing Tni? [58], 
(truncated by ISi?6' at one end as found in other plasmids [55]), 
and further disrupted by a A/acTX-M-i.^/IS-Ecpi transposition unit 



PLOS ONE I www.plosone.org 



7 



June 2014 | Volume 9 | Issue 6 | e99209 



K. pneumoniae BAA-2146 Resistome and Mobilome 



IncD Iteron 
IncFIB Rep3 
IncFIB Iteron 

Integrase 



StbD UmuD DNA Pol III 



DNA-binding 
PsiB 



PsIA 

• Sok antitoxin 



MptiR 
Mph(A) 



TetA(A) 




^1 abR genes 

^1 other genes 

Hypothetical genes 
I I Unique regions 

Unique junctions 
B \S26 
I I Other mobile 

elements 



Endonuclease 
CopB 
CopA RNA 
■IncFIIA RepA 



Hellcase 



Probable ATP /GTP 

Probable membrane YP02297 



CopG 

Metal transporting ATPase 
Endopeptldase M23/M37' 

Multlcopper oxidase 
Copper resistance C 
Copper resistance D 

Regulator PcoR ' 
His kinase 
Copper- binding 
Plasmid stabilization 



, AsH. „ 
Repressor AsD 



Arsenate reductase 



AsD ; YdfA 
Repressor 



Figure 3. pKpn2146c. Key, color coding of genes, mobile elements, and unique regions and juxtapositions, with additional colors for non-gene 
features. Inner ring, representative long matches to other plasmids. Innermost black arrows, recent recombination events. HR, homologous 
recombination; abR, antibiotic resistance. 
doi:1 0.1 371 /journal.pone.0099209.g003 



[59]. This pKpn2146b ISEcpl copy has spawned a recent 
tranposition event moving bkcTx.-M-\5 to a chromosomal site. 
Chromosomal i/flcTX-M-i.-i has not been identified in any complete 
genome, but has been reported at an undetermined locus in a 
different multUocus sequence type [60]. This recent transposition 
event from the plasmid used a different right end for the 
transposing unit (1618 bp flank) than did the earKer insertion into 
the plasmid Tn2 (1315 bp flank); the resulting chromosomal copy 
has 1 00% identity with the plasmid parent and is flanked by a 5-bp 
DR. A partial 1&CR2 (disrupted tnp and on) is found with its 
frequently associated strA, strB and sul2 genes. The mercury- 
resistance operon-carrying ATnffi^? is only one arm of the full- 
length Tn6187, but nonetheless has the same inverted repeats at 
both ends as the full-length, suggesting that it alone could be a 
transposing element; it however lacks the expected flanking direct 
repeats, and thereby conforms to the IS-flank .switch pattern, 
suggesting that its flanks may have been shuffled by homologous 
recombination. The integron within Tnl331 (Fig. S2 in File SI) 
[61] is found truncated at one end by 1S26, and at the other end 
by lSKpnl4 leaving aac(6' )~Ib as its only intact resistance gene. 



Mysterious plasmid pKpn2146a 

pKpn2146a (Fig. S3 in File SI) was replicon-typed as ColE, 
encoding RNAs I and IF The ColEl mobilization site [born] was 
determined by comparison to other ColEl plasmids. The typical 
short ColE 1 proteins that affect ori (Rom protein) or iom-site (Mob 
proteins) function could not be identified; indeed, none of its 
potentially encoded proteins show homology to any proteins in 
public databases. The most closely related known plasmid pB1021 
(NC_0 19989), from K. pneumoniae BB1090, shares the common 
RNAII region and uniquely shares a second large portion of 
pKpn2146a. This surprisingly short (2014 bp) plasmid was 
supported by MiSeq coverage and verified by PCR (data not 
shown). 

Genomic islands determined by multiple approaches 

Plasmids frequently disseminate antibiotic resistance genes in 
Klebsiella, but genomic islands are also potential vehicles. Our 
program Islander [3 1] found six islands in tRNA/ tmRNA genes, 
including a tandem island pair at a tRNAj^^u gene. PHAST [32] 
confirmed three of these and identified four additional prophage- 
like islands, one precisely within the gene for the short regulatory 
RNA RybB. The 10 resulting islands accounted for 6.3% of the 
Kpn2146 chromosome. We used these 10 Lslander/PHAST 
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recombination; abR, antibiotic resistance. 
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islands (Table 3) as a training set for a phylogenomic approach to 
find additional islands, based on the principle that islands tend to 
occur sporadically among closely related strains. The Kpn2146 
chromosome was partitioned into "phyloblocks", which we define 
as DNA intervals where all positions share the same phylotype, i.e., 
the same presence/ absence profile among a given set of closely 
related genomes. We selected phyloblocks that were enriched in 
(i.e., "learned" from) the training islands. These learned phylo- 
blocks pointed to the island Kpn23SapB, with an integrase gene 
and att site pair, that was missed by Islander and Phast. Learned 
phyloblocks also pointed to the non-island genomic locus cps-lps, 
described further below. An overview of learned phyloblocks 
across the chromosome (Fig. 5) shows the tight mapping to cp.s-lps, 
mobile islands and ISs. 

To summarize, the 1 1 islands identified here (Table 3) amount 
to 365 kbp. Ten islands were precisely determined, having found 
an integrase gene and both attL and attR sites. Two islands had 
damage in the attR tRNA fragment, as has been previously 
observed [27]. Only five of these islands were found in the closely 
related strain K. pneumoniae HSl 1286. 



The island Kpn23SapB has an Inl27 integron fragment 
containing an aadA2 cassette (Fig. S2 in File SI). An upstream 
\^26 insertion has displaced the integron Pc promoter, yet 
generated a new plausible promoter with the —35 TTGCA from 
1^26, a 17 bp spacer, and a —10 TTTCAT from the integron. 
This aadA2 is the only island-borne resistance determinant 
identified here. However, some mobile genes with currendy 
unknown function may eventually prove to be new virulence or 
resistance genes; the islands are enriched in hypothetical genes 
(Table S3 in File SI). Considering non-hypothetical genes, nine 
islands primarily possess phage genes, while Kpn55F encodes 
plasmid-like ParAB and some type IV secretion system functions 
indicative of an integrative coiijugative element (ICE). Islands 
contain five of the six chromosomal group II intron copies. 

Operon fusion and translocation at the cps-lps 
polysaccharide synthesis locus 

Learned phyloblocks indicated, in addition to a new island, the 
genomic locus of capsular polysaccharide (cps) and lipopolysac- 
charide [Ip.s] synthesis genes (Fig. 6). This region is not an integrase- 
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mobilized genomic island, yet the cps cluster is known to be so 
highly varied as to suggest horizontal transfer of genes within the 
array [62] . The capsule is the outermost cell surface, a key Klebsiella 
pathogenicity determinant subject to immune surveillance. In 
other Enterobacteriaceae, the large cps and Ips gene clusters are 
typically separate, but in Klebsiella, Ips is found immediately 
downstream of cps. Nevertheless there normally appears to be 
transcriptional separation between Klebsiella cps and Ips; cps 
terminates with the reverse-oriented gene uge, and an Ips promoter 
has been found in the large intergenic space between uge and Ips 
(Fig. 6A) [62]. The Kpn2146 cps-lps region has undergone a major 
rearrangement with gene-regulatory consequences (Fig. 6B). The 
terminal cps P3 transcription unit is deleted from its usual site, 
fusing the Ips operon to the main cps operon. Morever this cps P3 
unit has translocated to a nearby location, within a complex array 
of insertion sequences. In this new location the P3 unit is 
transcriptionally isolated, whereas at the usual location transcrip- 
tion could be supplemented by the upstream P2. Deletion of a 
polysaccharide synthesis gene cluster by homologous recombina- 
tion between repeated manCB units has been noted before [63], but 
in our case the translocation has preserved the deleted cps 
subcluster. 

Circular transposition intermediates of \SKpn21 

Above we demonstrated transposition of A/acTX-M-i5 from a 
resident plasmid to the chromosome by sequence comparison. 
Another way to assess the potential of a transposon to disseminate 
antibiotic resistance genes is to identify active transposition 
intermediates. Such intermediates have previously been foimd in 
vivo as free molecules unintegrated into chromosomes or plasmids, 
in circular, linear or tandem repeat linear forms [64], in the two- 
step transposition mechanism used by elements of the 1S3, IS30, 
1S21 and 1S256 families. We present here a novel approach for 
detecting circular transposition intermediates, through high- 
throughput sequencing. Examining the termini of lSKpn21, we 
found MiSeq reads where ISKpn21 ends were linked, and 
separated by 5-bp direct repeat from one of the two integrated 
copies (Table S3 and Fig. S4 in File SI). Possible explanations for 
these sequences are i) that what we had assembled as single copies 



were instead tandem genomic repeats, or ii) that these are from 
circular molecules free from the genome. We tested the integrated 
lSKpn21 copies by PGR and found each to be present as a single 
unit, not as a tandem (Fig. S4 in File SI). We also tested for a 
genome-free circle (or possibly genome-free tandem) and observed 
the indicated PGR product. The copy number of each circle and 
each end of its integrated parent \SKpn21 copy was measured, 
yielding an average circle:parent ratio of 3.72% ±0.84%, presum- 
ing no sequencing bias. The pKpn2146c copy of lSKpn21 has 
different direct repeat sequences at its two flanks, perhaps due to 
recombination between different ancestral copies. Finding only the 
left end direct repeat in its circle sequence suggests, without 
achieving statistical significance, that the left end of lSKpn21 
preferentially attacks the right end during circularization. We 
propose that l&Kpn21 and perhaps the entire ISjVCT family use the 
two-step transposition mechanism of the IS5 family. 

Using PacBio reads to detect liomologously recombinant 
subpopulations 

Above we used sequence comparison to demonstrate homolo- 
gous recombination at high copy repeats as a mechanism for 
reassorting resistance determinants. Here we present a new 
method for measuring recombinant subpopulations in a bacterial 
culture. Small numbers of PacBio reads disagreed with the 
preponderant assembly pattern across the 8 copies of the rRNA 
operon and the 8 copies of a group II intron (Fig. S5 in File SI). 
To the extent that the PGR-free PacBio method is not expected or 
known to produce in vitro homologous recombination artifacts, 
our data indicated that approximately ~4% of this bacterial 
culture was recombinant across these repeats. 

Klebsiella phylogeny revises taxonomy 

We expanded the phylogenetic analysis used in our learned 
phyloblocks analysis, to produce a robust genome-based phyloge- 
netic analysis of Klebsiella (Fig. 1). This reveals a clade with 
Kpn2146 and fellow members of multi-locus sequence type (ST) 
\l, K. pneumoniae HS 11286 and K. pneumoniae ]M4:5, from which 
sprang a tight clade of heavUy sequenced K. pneumoniae ST258 and 
ST512 hospital strains; Kpn2146 is the only A/flNDM-i -containing 
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Figure 6. Operon translocation and fusion at tKie cps-lps polysaccharide synthesis locus. The cps P1, P2 and P3 promoters are taken from 
[68], while a promoter (Pips) has been mapped in K. pneumoniae MGH 78578 to the intergenic space between uge and the first Ips gene [69], A) The 
cps-lps region of K. pneumoniae 342, which is typical of Klebsiella. Genes of cps are in yellow (common in most strains) or blue (varying in gene 
identity, count, and order); genes of Ips are in red. The manCB unit (orange arrows) is occasionally found in cps, and occasionally in Ips, and here 
unusually in both. The diamond represents the JUMPstart DNA/RNA motif at whose ops sequence RfaH is loaded onto the elongating RNA 
polymerase in place of NusG, preventing Rho-based termination for the small number of long transcription units that are controlled by ops-RfaH, and 
physically coupling the elongating RNA polymerase to the trailing ribosome [70]. B) Kpn2146 cps-lps. The boxed cps P3 unit has been deleted from its 
usual site, and moreover translocated to a nearby position, apparently by transposition and/or homologous recombination mechanisms; note the 
complex pattern of surrounding IS insertions and the directly repeated flanking sequence copies (gray arrows).AIS, incomplete IS copy; dotted lines, 
gene or IS interrupted by ISs; GT, glucosyl transferase, Hyp, hypothetical. 
doi:1 0.1 371 /journal.pone.0099209.g006 



member of tliis clade (or indeed our entire tree). The surrounding 
and subtending of Entembacter aerogenes and Raoultella with Klebsiella 
taxa of long standing, with 100% bootstrap support, suggests that 
all should be subsumed under Klebsiella and that the genus 
Raoultella, defined based on analysis of only two genes [62], should 
be abandoned. 

Conclusions 

A single relatively small lUumina read set, combined with a 
PacBio set of longer but less accurate reads, was sufficient to 
assemble the genome despite the numerous repeat and high-GC 
regions, with no need for gap closure by PGR. Moreover we 
demonstrated direct detection of an active transposable element by 
high-throughput sequencing. Our novel read-visualization tools 
(http:/ /bioinformatics. sandia.gov/software/index.html) were use- 
ful for working through problematic areas, and this software was 
developed into a greedy contig assembler. 



The known extensive antibiotic-resistance profile of Klebsiella 
pneumoniae ATGG BAA-2146 (Kpn2146) was explained and 
additional resistances, which remain to be tested experimentally, 
were suggested by the genome sequence. Several mechanisms 
were identified for the mobility of resistance genes: i) acquisition of 
plasmids and genomic islands, ii) integron cassette swapping 
(whole or partial integrons account for eight antibiotic-resistance 
genes), iii) transposition events from chromosome to plasmid 
leading to greater disseminability of resistance, and vice versa 
leading to greater stability in the genome, and iv) homologous 
recombination at high copy repeats. Gaining more insight into 
such key evolutionary mechanisms, beyond simply identifying 
them, often comes through technological advances. Here we have 
made novel use of high-throughput sequencing technologies to 
inform both transposition and homologous recombination. 

Numerous mobile genetic elements were identified. The eleven 
genomic islands were identified by three different methods that 
were based on the preference of islands for tRNA gene integration 
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sites [31], clustering of phage genes [32], and a novel 
phylogenomic approach introducing phyloblocks, DNA segments 
with shared phylogenetic profiles that may be applicable in more 
general studies of horizontal gene transfer (Fig. 6). A recent study 
of the closely related ST258 K pneumoniae, published while our 
manuscript was under review, also found numerous islands and 
indicated the cps locus as the major non-island chromosomal site of 
variation among strains [65]. 

The Kpn2146 genome illustrates the massive arsenal of 
antibiotic-resistance genes, and agile repertoire of mobile genetic 
elements, that the emerging CRE bacteria have at their disposal 
for adapting to new challenges. Homologous recombination at 
multicopy sequences [66], site-specific recombination by resolvases 
[67], switching of integron cassettes, and transpositions have 
shaped Klebsiella plasmid mosaicism. 
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